Design and Analysis of Joint Group Shuffled Scheduling Decoding Algorithm for Double LDPC Codes System

In this paper, a joint group shuffled scheduling decoding (JGSSD) algorithm for a joint source-channel coding (JSCC) scheme based on double low-density parity-check (D-LDPC) codes is presented. The proposed algorithm considers the D-LDPC coding structure as a whole and applies shuffled scheduling to each group; the grouping relies on the types or the length of the variable nodes (VNs). By comparison, the conventional shuffled scheduling decoding algorithm can be regarded as a special case of this proposed algorithm. A novel joint extrinsic information transfer (JEXIT) algorithm for the D-LDPC codes system with the JGSSD algorithm is proposed, by which the source and channel decoding are calculated with different grouping strategies to analyze the effects of the grouping strategy. Simulation results and comparisons verify the superiority of the JGSSD algorithm, which can adaptively trade off the decoding performance, complexity and latency.


Introduction
In an integrated communication system [1], Shannon's separation principle indicates that arbitrarily high reliability can be attained for infinite source and channel code block lengths. In a nonasymptotic regime, a joint source-channel coding (JSCC) [2] design can be more attractive, allowing source redundancy and channel information to be exchanged iteratively to improve decoding performance. In addition, the JSCC system can also reduce decoding complexity and transmission delay, which results in its successful application in image and video transmission [3,4].
A JSCC scheme based on double low-density parity-check (D-LDPC) codes was proposed [2]; in this scheme, one LDPC is for source compression and another is for channel error-control. As the LDPC code can be represented by a Tanner graph, a belief propagation (BP) decoding algorithm can be applied. The source and channel coding structures both perform BP decoding, pass decoding information to each other and accomplish information exchange. However, this is a parallel decoding method and needs multiple pieces of calculation hardware working simultaneously. This high decoding complexity is not suitable for low-complexity green communication systems, e.g., Internet of Things (IoT) [5].

Related Work and Motivation of Shuffled Scheduling Decoding
Recently, the concept of shuffled scheduling decoding [6], a kind of serial decoding, was introduced into the D-LDPC codes system [7]. This can lower the decoding hardware complexity and reduce the number of decoding iterations (an iteration indicates all of the VNs and CNs being updated one time). In order to take full advantage of the linking relationship, a more generalized shuffled scheduling decoding algorithm [8] was proposed. However, this algorithm respectively applied the shuffled scheduling strategy to source decoding and channel decoding, which in fact is a separated decoding method, denoted as the separated shuffled scheduling decoding (SSSD) algorithm. Compared with the conventional BP decoding algorithm, the SSSD algorithm has a higher decoding latency.
In this paper, a joint shuffled scheduling decoding algorithm will be proposed for the D-LDPC codes system from a global viewpoint. The proposed algorithm is the generalization version of the SSSD algorithm. It considers the Tanner graph of source and channel coding structure as a whole and applies shuffled decoding. Moreover, the joint Tanner graph can be divided into a number of groups to trade off the decoding performance and decoding latency. The grouping relies on the types or on the code-length. We describe the decoding algorithm as a joint group shuffled scheduling decoding (JGSSD) algorithm. Specifically, if the grouping divides the Tanner graph into two parts, i.e., the source and channel parts, then the JGSSD algorithm changes to the SSSD algorithm.

Related Work on D-LDPC Codes Systems
In recent years, a large amount of research has focused on the investigation of the D-LDPC codes system [9][10][11]. The main difference between the D-LDPC system and single channel LDPC code systems is the consideration of nonuniform sources and source coding. For the D-LDPC codes system, the source coding may cause the loss of original information and trigger the phenomenon of error floor. In order to improve the performance of the error floor region, a linking matrix is set up between the variable nodes (VNs) of the source code and the check nodes (CNs) of the channel codes [12], and more original information participates in coding. The improvement of the error floor can be evaluated by the source decoding threshold and analyzed by the source protograph EXIT (SPEXIT) algorithm [13]. The linking matrix is further optimized for high-entropy sources [14]. The source LDPC coding matrix can be also optimized to match the source statistic [15] as well as the joint optimization of the source coding matrix and linking matrix [16].
On the other hand, the source redundancy left in the source coding affects the specific structure of the D-LDPC codes. Firstly, the effect of the source statistic is analyzed over the Rayleigh fading channel compared with reception diversity [17]. The channel decoding threshold can be evaluated using the joint protograph EXIT (JPEXIT) algorithm, by which the channel P-LDPC codes [18] and the allocation [19,20] of an important structure, i.e., degree-2 VNs [21], are redesigned. Several works are also performed for the joint component design, including the optimized source and channel pairs [22] and the joint coding matrix [23,24].
In addition, an information shortening strategy was conducted to reduce the effects of the short cycles in the Tanner graph [25]. The D-LDPC codes system was also considered in some nonstandard coding channels [26][27][28]. Spatially coupled LDPC codes were introduced into the D-LDPC codes system [29]; these can perform sliding window decoding (SWD) for significantly reduced latency and complexity requirements. A proposed SWD algorithm [30] with variable window size was optimized for balancing performance and complexity. The D-LDPC codes system has been applied to image transmission [31,32].

Main Contribution
The aforementioned D-LDPC codes systems mostly perform BP decoding algorithms. In this paper, a joint decoding viewpoint is introduced, and shuffled scheduling decoding for the D-LDPC codes system is generalized.
The novelty and contributions of this paper can be summarized as follows: (1) From a global viewpoint, the D-LDPC codes structure is considered as a whole, and a joint shuffled scheduling decoding strategy is introduced to the D-LDPC codes system. (2) A grouping method for the joint shuffled scheduling decoding strategy, which relies on the types or the length of the VNs, is introduced. (3) A novel EXIT algorithm to calculate the channel and source decoding thresholds for the general D-LDPC coding structure with the JGSSD algorithm is proposed. (4) A comparison between the SSSD algorithm and the JGSSD algorithm is conducted, including decoding performance, decoding complexity and decoding latency.
The main differences between the present work and previous work are shown in Table 1.

Paper Organization
The remainder of the paper is organized as follows. In Section 2, we describe the preliminaries of the D-LDPC codes. The joint shuffled scheduling decoding algorithm with grouping strategy is proposed in Section 3. An EXIT algorithm for analyzing the D-LDPC codes system with JGSSD algorithm is presented in Section 4. A simulation and comparisons are conducted in Section 5. Finally, Section 6 draws the conclusions of this paper.

The D-LDPC Coding Structure
An LDPC code can be represented by a protograph, a small protomatrix B = [b ij ], where b ij indicates the number of edges connecting a VN v j to a CN c i . Then, a large paritycheck matrix can be obtained by a "copy-and-permute" operation, such as the progressive edge growth (PEG) algorithm [33] with a lifting factor.
A D-LDPC code can be represented by a joint protograph B J as follows: where B s is the source coding protomatrix of size m s × n s , B c is the channel coding protomatrix of size m c × n c , B l1 = [I 0] (I is an identity matrix) is the source-check-channel-variable linking protomatrix of size m s × n c and B l2 is the source-variable-channel-check linking protomatrix of size m c × n s . Then, a joint parity-check matrix H J can be derived: where H s is the source coding matrix of size M s × N s , H c is the channel coding matrix of size M c × N c , H l1 = [I 0] is the source-check-channel-variable linking matrix of size M s × N c and H l2 is the source-variable-channel-check linking matrix of size M c × N s . The overall code rate of the D-LDPC codes is given by

Transmission System Model
Assume that original source bits s ∈ {0, 1} (1×N s ) are generated from a binary independent and identically distributed (i.i.d) Bernoulli source, where the probability of "1" is η. The encoding procedures with a nonzero H l2 are given as follows. Firstly, the compressed source bits c can be obtained by where (·) T represents the matrix transposition in math. Then, a codeword u can be obtained by In the decoding process, a binary-phase-shift keying (BPSK) and AWGN channel model are assumed and the log-likelihood-ratio (LLR) values of all VNs are first initialized. Then, as shown in Figure 1, the iterative BP (IBP) algorithm is performed as follows: (1) Update all the C2V messages for each of the M c CNs in the channel part; (2) Update all the V2C messages for each of the N s VNs in the source part; (3) Update all the C2V messages for each of the M s CNs in the source part; (4) Update all the V2C messages for each of the N c VNs in the channel part; (5) The source part and channel part exchange decoding information through H l1 and H l2 (i.e., the dashed blue and red lines in Figure 1); (6) Estimate the codewordû based on the posterior LLRs at the VNs; (7) Repeat Steps (1) to (6), unless (i) the estimated codewordŝ andd satisfyŝ(H s ) For the SSSD algorithm in [7,8], the updates for C2V in the channel part and source part, i.e., Step (1) and (3), are performed using shuffled scheduling. For more details about the decoding procedure, the reader can refer to [8].

Joint Group Shuffled Scheduling Decoding Algorithm
It can be observed that the BP and SSSD algorithms mentioned above are both types of iterative decoding method between the source decoder and channel decoder. The shuffled scheduling decoding is only respectively applied in source decoding and channel decoding, and not applied to the update of H l1 and H l2 . Thus, the SSSD algorithm is in fact a separated decoding method.
Here, considering that the vector u satisfies Thus, the combined Tanner graph of the source part and the channel part can be considered a joint Tanner graph. We apply the BP decoding to the D-LDPC codes system, and this is denoted as a joint BP decoding (JBP) algorithm. For ease of description of the JBP algorithm, several types of LLRs are defined, and k-th iteration is assumed.
• z s n represents the LLR of the n-th bit of original source s. • z d n represents the LLR of the n-th bit of codeword d. • ε k mn represents the LLR from the m-th CN to the n-th VN at k-th iteration. • φ k mn represents the LLR from the n-th VN to the m-th CN at k-th iteration. • Φ k n represents the LLRs of the n-th bit at k-th iteration. Based on the above definitions, the decoding procedure is described as follows. Initialization: The initial LLR of VNs can be calculated by and z d n = (2r n )/σ 2 , (n = 1, 2, · · · , N c ), where r n is the n-th received signal and σ 2 is the noise variance, and the LLRs are 0 for the punctured bits. Furthermore, the σ 2 can be calculated by where E s is the average transmitted energy per source information bit, and N 0 is the noise power spectral density.
Step 3: Stopping condition: ifû · H J = 0 or k = K max , the iteration will stop; otherwise, set k = k + 1 and go to Step 1, where K max is the maximum iteration of decoding.
For the joint shuffled scheduling BP algorithm, the initialization and stopping conditions remain the same as in the JBP algorithm. The only difference between the two algorithms lies in the updating procedure. For the updated C2V message, certain φ k mn have been updated in Step 2 and can be used instead of φ k−1 mn in Step 1 to calculate the remaining values ε k mn . Thus, the updated C2V message can be modified as follows: However, it is observed that one iteration of the standard JBP algorithm can be fully processed in parallel, while that of a shuffled JBP algorithm becomes totally serial, and this will bring about a large decoding latency. To decrease decoding latency and keep the parallelism advantages of the standard JBP, the concept of grouping is introduced, and the decoding algorithm is developed into a joint group shuffled scheduling decoding (JGSSD) algorithm. Before performing BP decoding, the decoding information is first divided into a certain number of groups. The updating of information in each group is processed in parallel, but the processing of groups remains serial. In detail, the VNs are divided into a number of groups according to certain criteria, i.e., where V g = [V g i ](g = 1, 2, · · · , G H , i = 1, 2, · · · , n g ), G H is the number of groups and n g is the size of V g . Thus, the updated C2V message can be modified as follows: For n ∈ V g , i.e., N g−1 < n < N g−1 + n g and m ∈ Θ(n), where Θ(m) denotes all VNs connected to the m-th CN, Θ(n) denotes all CNs connected to the n-th VN, Θ(m) \ n denotes all VNs in Θ(m) excluding the n-th VN, N g−1 is the size of the sum of the former sets {V 1 , V 2 , · · · , V g−1 }. For the JGSSD algorithm, we can group the V based on code length. For example, we assumed that the N s + N c VNs are divided into G H groups on average, and that each group contains (N s + N c )/G H VNs (assuming (N s + N c ) mod G H = 0 for simplicity). We can also group the V according to the types of VNs. For example, V = {V 1 , V 2 } and V 1 = {source VNs} of length N s and V 2 = {channel VNs} of length N c . Now, the JGSSD becomes the IBP algorithm. If the V 1 and V 2 are further divided into N s and N c groups, respectively, this grouping strategy makes the algorithm change to be the SSSD version.

Joint Shuffled Extrinsic Information Algorithm
EXIT analysis can reflect the ultimate performance of a D-LDPC codes system by calculating the decoding threshold of its corresponding protograph. Although an EXIT algorithm for analyzing the decoding threshold of the D-LDPC codes system with shuffled scheduling has been proposed in [34], it only aimed at the B J with B l2 = 0. In addition, the EXIT algorithm is comprised of source and channel parts, just like the decoding procedure. Thus, it is not suitable for the D-LDPC codes system with general structure and JGSSD algorithm. In this section, a joint shuffled extrinsic information transfer (JSEXIT) algorithm is proposed.
Firstly, five types of mutual information (MI) are defined as: • I Ev ij,(k) : the extrinsic MI from j-th VN to i-th CN at k-th iteration; • I Ec ij,(k) : the extrinsic MI from i-th CN to j-th VN at k-th iteration; • I Av ij,(k) : the a priori MI from j-th VN to i-th CN at k-th iteration; • I Ac ij,(k) : the a priori MI from i-th CN to j-th VN at k-th iteration; • I APP j,(k) : the MI between a posteriori LLR evaluated by j-th VN and the corresponding source bit s j at k-th iteration.
In addition, an indicator function is defined as follows: and if a VN is punctured, its initial LLR value is 0. Moreover, J(σ ch ) represents the MI between a binary bit and its corresponding LLR value L ch ∼ N(σ 2 ch /2, σ 2 ch ) N(θ, σ 2 ) represent the Gaussian distribution with expectation θ and variance σ 2 , given by [2] J Then, the VNs of the joint protograph are divided into a number of groups according to certain criteria, i.e., where v g = [v g i ](g = 1, 2, · · · , G B , i = 1, 2, · · · , t g ), G B is the number of groups and t g is the size of v g .
Finally, the proposed JSEXIT algorithm for the D-LDPC codes system over AWGN channel is described as follows.
For 1 ≤ j ≤ (n s + n c ) and 1 ≤ i ≤ (m s + m c ) Step 2: The MI update from CNs to VNs where and set I Av ij,(k) = I Ec ij,(k) . Further, θ(i) \ j denotes all VNs connected to the i-th CN excluding the j-th VN, T g−1 is the size of the sum of the former sets {v 1 , v 2 , · · · , v g−1 }. It is noted that in the calculation of I Ec ij,(k) , partial I Ac is,(k) has been updated to replace the I Ac is,(k−1) , which is reflected in the different calculations of α (k−1) and α (k) .
Step 3: The APP-LLR MI evaluation For 1 ≤ j ≤ n s and 1 ≤ i ≤ m s The procedure of Steps 1 to 3 is performed iteratively until I APP j = 1 or the maximum iteration is reached.
Remarks: If the maximum iteration is set to a large value, like the conventional EXIT algorithm, then the JSEXIT algorithm cannot reflect its advantage of shuffled scheduling, as with the larger iteration number in shuffled scheduling decoding, which has similar performance to that of the standard BP algorithm. We set the maximum iteration to 20 here for this reason. Therefore, the decoding threshold has a gap compared with that of the conventional EXIT algorithm, but it can provide comparable results.

Decoding Threshold Calculation
The MI I APP j can be viewed as a function of independent variables B J , η, σ ch and G B , i.e., where σ ch can be calculated from E s /N 0 . The channel decoding threshold (E s /N 0 ) th indicates the performance of the water-fall region, which is the minimum value to make all I APP j 1 for a given η. The η th indicates the performance of the error floor level, which is the maximum value to make all I APP j 1 when E s /N 0 → ∞. The (E s /N 0 ) th and η th will also be calculated when a different G B is set. Without loss of generality, two examples using regular LDPC codes as source and channel code are presented as follows, where B J1 is with B l2 = 0 and another B J2 is with B l2 = 0. The regular source and channel protographs with degree-3 VNs are given by The B l1 is given by Thus, the B J1 and B J2 are respectively represented by The source decoding thresholds η th and channel decoding thresholds (E s /N 0 ) th for different grouping strategies are calculated and shown in Tables 2 and 3. It can be seen that 1) the JSEXIT algorithm can calculate the decoding threshold regardless of the B l2 = 0 or B l2 = 0; 2) with the increase of G B , the source coding threshold becomes large and the channel threshold becomes small. For example, with B J2 at source statistic η = 0.11, the case of G B = 16 outperforms the cases of G B = 8, G B = 4, G B = 2 and G B = 1 by 0.12 dB, 0.35 dB, 1.17 dB and 1.27 dB, respectively.

Simulation and Comparison
In this section, we will illustrate the advantages of the JGSSD algorithm through Monte Carlo simulations and analyses of iteration number and decoding latency. For all simulations, the length of the source sequence is 3200, so the lifting factor of the PEG algorithm for B J1 and B J2 is 800. The maximum number of decoding iterations K is set as 30.

BER Performance
The BER performance of B J1 and B J2 with different grouping strategies are shown in Figures 2-5. Firstly, it should be noted that the JGSSD algorithm is suitable for both the cases of B l2 = 0 and B l2 = 0. Secondly, the BER performance in the water-fall region becomes better as the G H increases; this is in line with the EXIT analysis. It should be noted that the case of G H = 3200 is equivalent to the SSSD algorithm in [7,8]. For the case of B J1 at source statistic at η = 0.04, the case of G H = 400 has a coding gain of 0.2 dB compared with the case of G H = 1 and has no significant difference compared with G H = 3200 and G H = 6400 at the BER of 2 × 10 −7 . Other comparisons can be seen in Table 4. Thirdly, the error floor level also becomes lower as the G H increases, and this is in line with the EXIT analysis. For the case of B J2 at source statistic at η = 0.11, the case of G H = 8 is better than that of G H = 1, but worse than that of G H = 6400. It should be explained that the case G H = 400 and G H = 3200 have almost the same error floor level as that of G H = 6400.

Decoding Complexity and Latency
Without loss of generality, the case of B J1 at source statistic η = 0.04 is taken to compare the decoding complexity and latency. The decoding complexity can be evaluated by the decoding iteration number. Thus, the average iteration number K avg for different grouping strategies is shown in Figure 6. With increasing E s /N 0 , the K avg decreases, but the trend is getting smaller. The larger G H has a smaller K avg than that of the smaller G H . For example, the cases of G H = 400 and G H = 8 respectively decrease by 40% and 34% compared with the case of G H = 1 at E s /N 0 = −2.4 dB, as shown in Table 5.   The decoding latency indicates the average time taken for information bits to be decoded. The decoding procedure consists of the C2V and V2C, and the total number of C2V and V2C depends on the average degree of CNs and VNs. Considering the same B J1 used here, the decoding time of one group of C2V and V2C is the same, denoted as t uni . Because the shuffled scheduling decoding algorithm is serial but the BP decoding algorithm is parallel, different grouping strategies imply the combination of serial and parallel methods. Take the case G H = 400 as an example: 400 groups perform the decoding procedure one by one, and every group of these 400 groups performs parallel decoding. Thus, the process will consume 400 t uni in an iteration. If the average iteration is K avg , the average time of decoding information bits can be calculated by As shown in Table 5, the decoding latency increases with the increase of G H . A reasonable decoding solution, i.e., the grouping strategy, should take BER performance, decoding complexity and decoding latency into consideration.

Conclusions
In this paper, a JGSSD algorithm for the D-LDPC codes system is proposed. The proposed algorithm considers the D-LDPC coding structure as a whole. In order to analyze the performance of different grouping strategies, a JSEXIT algorithm is proposed for the general D-LDPC coding structure, i.e., both the cases of B l2 = 0 and B l2 = 0. It can be seen that both the source decoding threshold and the channel decoding threshold improve as G B increases. The BER simulation is in line with the EXIT analysis, i.e., the BER performance has a better coding gain or lower error floor level when the G H has a higher value. In addition, the decoding complexity and decoding latency are also compared, and it is shown that the larger G H gives a lower decoding complexity but a higher decoding latency. Thus, a suitable shuffled scheduling decoding algorithm should give overall consideration to factors including performance, complexity and latency, and the JGSSD algorithm provides an intelligent choice. In future, the performance of the JGSSD algorithm can be studied for specific applications, such as multiple fading channels, and the optimization of the D-LDPC coding structure with the aid of the JSEXIT algorithm for different grouping strategies can be studied.

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript: